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Abstract 

We examine in this paper the problem of image registration from the new perspective where images are 
given by sparse approximations in parametric dictionaries of geometric functions. We propose a registration 
algorithm that looks for an estimate of the global transformation between sparse images by examining the 
set of relative geometrical transformations between the respective features. We propose a theoretical analysis 
of our registration algorithm and we derive performance guarantees based on two novel important properties 
of redundant dictionaries, namely the robust linear independence and the transformation inconsistency. We 
propose several illustrations and insights about the importance of these dictionary properties and show that 
' common metrics such as coherence or restricted isometry property fail to provide sufficient information in 

registration problems. We finally show with illustrative experiments on simple visual objects and handwritten 
digits images that our algorithm outperforms baseline competitor methods in terms of transformation- 
invariant distance computation and classification. 
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1 Introduction 

With the ever-increasing quantity of information produced by sensors, efficient processing techniques for idcn- 
^ I tifying meaningful information in high-dimensional data sets are becoming crucial. One of the key challenges 
^ is to be able to identify relevant objects captured at different times, from various viewpoints, or by different 
sensors. Sparse signal representation generally helps in reducing the dimensionality of the data analysis prob- 
lem. In general, it is however necessary to align signals in order to derive meaningful comparisons or distances 
in the analysis. Image registration thus represents a crucial yet non-trivial task in many image processing and 
• computer vision applications, such as object detection, localization and classification to name a few. 
\^ ' In this paper, we propose a registration algorithm for sparse images that are given as a linear combination 

> of geometric features from a parametric dictionary. The estimation of the global geometric transformation 
between images is performed first by forming a set of candidate solutions with the relative transformations be- 
I tween features in each images. The transformation that leads to the smallest transformation-invariant distance 
I is then simply selected as the global transformation estimate. While image registration is generally a complex 
optimisation problem, our algorithm offers a low complexity solution when the images have a small number of 
constitutive components. We analyze theoretically its performance, which mainly depends on the construction 
I of the dictionary that supports the sparse image representations. We introduce two novel properties for redun- 
I dant dictionaries, namely the robust linear independence and transformation inconsistency, which permit to 
. characterise the performance of the registration algorithm. The benefits of these properties are studied in detail 
and compared to common metrics such as the coherence or the restricted isometry property. We finally provide 
illustrative registration and classification experiments, where our algorithm outperforms baseline solutions from 
the literature, in particular for large geometrical transformations. 

Image registration has been widely investigated in the literature, but not from the point of view of sparse 
image approximations. There exists essentially three main classes of methods to achieve invariance to geomet- 
ric transformations: direct (pixel based) methods, feature-based methods, and methods that incorporate the 
invariance in the distance measure |16] . For a general survey on image alignment, we refer the reader to |20[|16j . 

First, direct methods simply consist in trying all candidate transformations and see how much pixels agree 
when the images are transformed relatively to each other. A major drawback of these methods is their inefficiency 
when the number of candidate transformations becomes large. Therefore, hierarchical coarse-to-fine techniques 
based on image pyramids have been developed [2] to offer a compromise between accuracy and computational 
complexity. 

Then, the popular feature based approaches [T7] represent a more efficient class of methods. They are 
usually built on several steps: (i) feature detection, which searches for distinctive locations in the images, (ii) 
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feature description, which provides a description of each detected location with an invariant descriptor, (iii) 
features matching and (iv) transformation estimation that estimate the global transformation by looking at 
matched features. Note that it is crucial in these class of methods to describe the features in a transformation 
invariant way for easier matching. We refer the reader to [14) for a comparison of the main different methods. 
A widely used framework with the above properties relies on SIFT features [H] , which offer invariance to affine 
transformation (i.e., translation, rotation and scaling) in the image representation. Even though these features 
have been very successful in many computer vision applications, they are mostly built on empirical results 
and several parameters need to be set manually. Furthermore, feature-based methods are not well suited for 
estimating large transformations between the target images, as the matching accuracy and keypoint localization 
degrade for large transformations. 

Finally, an alternative to building invariant representations of the image as in feature-based methods consists 
in estimating a transformation-invariant distance between images. The transformation-invariant distance is 
defined as the minimum distance between the possible transformations of two patterns. In general, the signals 
generated by the possible transformations of a pattern can be represented by a non linear manifold. Computing 
the transformation-invariant distance between two patterns or equivalently the manifold distance is thus a 
difficult problem in general. The authors in jl5j locally approximate the transformation invariant distance with 
the distance between the linear spaces that are tangent to the two manifolds. Vasconcelos et. al. [12] go beyond 
the limitations of local invariance in tangent distance methods by embedding the tangent distance computation 
in a multiresolution framework. Kokiopoulou et. al. in [llj achieve global invariance by approximating the 
original pattern with a linear combination of atoms from a parametric dictionary. Thanks to this approximation, 
the manifold is given in a closed form and the objective function becomes equal to a difference of convex functions 
that can be globally minimized using cutting plane methods. Unfortunately, this class of optimization methods 
have a slow convergence rate with complexity limitations in practical settings. 

In this paper, we propose to examine the image registration problem from a novel perspective, that is 
the sparse approximations of images in parametric dictionaries of geometric functions. Unlike the existing 
methods, this approach guarantees invariance to transformations of arbitrary magnitude and is generic with 
respect to the transformation group considered in the registration. Through its detailed analysis, the proposed 
framework further provides useful insight on the connections between image registration problems and sparse 
signal processing. 

The rest of this paper is organised as follows. In Section [2j we formulate the problem of registration of 
sparse images and present our registration algorithm. Section [3] proposes a theoretical performance analysis 
of our algorithm, and introduces two new dictionary properties. We finally present illustrative experiments in 
Section H) 

2 Registration of sparse images 
2.1 Preliminaries 

We first define the notations and conventions used in this paper. We denote respectively by R, R+, Rt the set of 
real numbers, the set of non negative real numbers and the set of positive real numbers. We consider images to 
be continuous functions in = {/ : K.^ — > M : \f{x)\'^dx < oo}. We denote the scalar product associated 

with as: (/, .g) = f{x)g{x)dx, and the norm by ||/j|2 = \f{x)\'^dx. Then, we define T to be a 

transformation group and denote by o its associated composition rule. We consider that the group T includes 
the transformations between pairs of images in our registration problem. We represent any transformation r] £ T 
by a vector in (where P denotes the dimension of T) containing the parameters of the transformation. 

Alternatively, we represent a transformation rj G T with its unitary representation U{ri) in L^. Therefore, 
for any rj G U (rj) is the function that maps an image / to its transformed image U (rj)f G by rj. Moreover, 
as U{ri) is a unitary operator, we have ||C^(?7)/||2 = ll/lb- In order to avoid heavy notations, we also use to 
denote U(r])f. We give in Table [T] some examples of transformation groups and their unitary representation in 



Group 


Parameters 


Composition 


Operator 




V 


rj ori' 


U{v)f = fv 




b 


b + b' 


f{xi - bi,X2 - b2) 


Special Euclidean group SE{2) 


(b,9) 


{b + Rgb',e + e') 


f{R^g{x-b)) 


Similarity group SIM (2) 


{b,a,e) 


{b + aRgb',aa',e + e') 





Table 1: Examples of transformation groups and their unitary representation in L^. Parameters with a prime 
are associated with a secondary transformation rj' , and Re denotes the rotation matrix with angle 6. 

The group is the group of translations in the plane. The Special Euclidean group SE{2) is the group 



2 



of translations and rotations in the plane. Its dimension is equal to 3 (2 degrees of freedom are associated 
with the translation and one is associated with rotation). The similarity group SIM{2) of the plane is the set 
of transformations consisting of translations, isotropic dilations and rotations. This group plays a particular 
importance in transformation invariant image processing since it contains the basic transformations we usually 
want to be invariant to. 

Finally, if c S R" and 1 < p < oo, we denote by ||c||j, the ip norm of c defined by \\c\\p = {J2^=i |ci|^)^^^- 
Note that the notation j| • ||2 is overloaded since it denotes either the continuous norm or the discrete £2 
norm. However, the distinction between both cases will be clear from the context. 

2.2 Problem formulation 

We formulate now the registration problem that we consider in the paper. Let Ii and I2 be two images in L^. 
We are interested in computing the optimal transformation between images Ii and l2- Hence, we formulate the 
original alignment problem as follows: 

(P'): Find li = argmin|l(7(77)/i - /2II2 • 

We denote by d{Ii,l2) = |j[/(77o)^i ~-^2|l2 the transformation invariant distance between Ii and l2- It corre- 
sponds to the regular Euclidean distance when the images are aligned optimally in the sense. Unfortunately, 
computing the transformation tj'q and the transformation invariant distance I2) is a hard problem since the 
objective function is typically non convex and exhibits many local minimas. 

In order to go around this problem, we consider that the images are well approximated by their sparse 
expansion in a series of geometric functions. Specifically, let I? be a parametric dictionary of geometric features 
constructed by transforming a generating function (j> G as follows: 

2? - {0^ : 7 e r4 c (1) 

where 7d C T is a finite discretization of the transformation group T and <j>~^ — U{'-f)4> denotes the transformation 
of the generating function (f) by 7. We denote by p and q the respective X-sparse approximations of Ii and I2 
in the dictionary V: 

q = Ejli di(t>s, ■ (2) 

Since the dictionary 2? contains features that represent potential parts of the image, we assume that coefficients 
Ci and di are all non negative so that the different features do not cancel each other. 

We refer to any element (f'-y i^i ^ a feature or atom. We suppose in this paper that the generating 
function is non negative. By appropriately choosing a generating function of finite effective support, each 
atom in the dictionary thus corresponds to a potential part of the image. Besides, we suppose for simplicity 
that 71—^0^ defines a one-to-one mapping. This assumption means that the generating function does not have 
any symmetries in Finally, we suppose without loss of generality that the mother function is normalized 
so that ||(/)||2 = 1. 

We can now reformulate the registration problem as the problem of finding the optimal relative transforma- 
tion between sparse images. In particular, we reformulate our registration problem as follows: 

(P): Find r/o = argmin \\U{ri)p — q\\^ . 

The smallest distance d{p,q) = \\U{rio)p — q\\2 is the transformation invariant distance computed between the 
sparse image approximations p and q. Compared to the original problem, the images Ii and I2 are replaced by 
their respective sparse approximations p and q. This presents some potential advantages in applications where 
users do not have access to the original images; most importantly, the prior information on the support of p 
and q effectively guides the registration process, as we will see in the next paragraph. We should note that if 
the images are not well approximated by their sparse expansions, the solution of (P) may substantially differ 
from the true transformation obtained by solving (P'). 

2.3 Registration algorithm 

We propose now a novel and simple algorithm to solve the registration problem for images given by their sparse 
approximations. The core idea of our registration algorithm lies in the covariance property of the dictionary 
T): a global transformation applied on the image induces an equivalent transformation on the corresponding 

^We extend this assumption to the more general setting where the stabiUzcr of (p defined by = {7 € 7" ; U{'y)(j> = 0} is a 
finite set in Appendix [B] 
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features. Thanks to this covariance property, it is possible to infer the global transformation between the images 
by a simple computation of the relative transformations between the features in both images. 

Specifically, let T^''^ be the set of relative transformations between pairs of features taken respectively in p 
and q: TJ'' = {5^ o -fj^ : 1 < i, J < A'}. We can thus estimate the relative transformation between the images 
by solving the following relaxed problem of (P): 

(P): Find i) = argmin||[/(?7)p — ^112 . 

The minimum of the objective function da{p,q) ~ \\U{ri)p— q\\2 is defined as the approximate transformation- 
invariant distance between Ii and l2- 

Even though problems (P) and (P) share some similarities, they differ in an important aspect, that is the 
search space. It is reduced from T to the finite set TJf''^- This constrains the estimated transformation to be equal 
to a transformation that exactly maps two features taken respectively from p and q. The assumption that T 
can be replaced by 72"'' originates from the observation that features are covariant to the global transformation 
applied on the original image. Even though this assumption is not necessarily true for all features when 
innovation exists between the images (other than a global transformation), we expect to have at least one 
feature whose transformation is consistent with the optimal transformation 770- We analyze in detail the error 
due to this assumption in Section [31 The advantage of replacing T by 72'''' is however immediate: we have 
reduced an intractable problem to a problem whose search space is of cardinality at most K^. Since K is 
generally chosen to be small enough, the problem (P) can be efficiently solved by a full search over all the 
elements of 72''"' • The registration algorithm is summarised in Algorithm [T] 



Algorithm 1 Image registration algorithm 

Input: sparse approximations p = X]fc=i '^i4'yi ^-i^d Q = X]il=i di4'Si- 

1. Construct the set XF'"^: 

TP'''^{d,oj-^:l<i,j<K}. 

2. Estimate the transformation f) and da{p,q)- 

f] ^ argmin||C/(77)p- gf||2 , 
da{p,q) ^ \\U{T])p-q\\^ . 

3. Return {fj, da{p, q))- 



The value of K controls the computational complexity of Algorithm [TJ a large value of K results in a 
large cardinality of the search space 72''^. Furthermore, the value of K also generally controls the error in the 
approximation of the original images by their sparse expansions p and q. We discuss more in detail the influence 
of K on our registration algorithm in Section 21 Note flnally that we have supposed for simplicity that both 
images Ii and I2 arc approximated by the same number of features. However, it is easy to see that one can 
generalize it to the case where the number of features are different in the two images. In this case, we have 
|72'''?| = K1K2 instead of K"^, where Ki and K2 are the number of features in Ji and I2 respectively. 

In the next section, we analyse the performance of the proposed registration algorithm in different settings, 
and focus in particular on the influence of the dictionary T) on the registration performance. 

3 Theoretical analysis 

In this section, we examine the penalty of relaxing the original problem (P') into (P) in terms of registration 
performance. We flrst discuss the framework and the assumptions used in our analysis. Then, we study a 
simple case where the image patterns are exactly related by a (possibly very large) geometrical transformation. 
We show that under a mild assumption on the dictionary, our algorithm achieves perfect registration. We then 
extend the analysis to the general case and introduce two key properties of the dictionary (namely robust linear 
independence and transformation inconsistency). We show that under some conditions on these properties, our 
algorithm succeeds in recovering the correct relative transformation with a bounded error in the general case, 
as long as the innovation between the images (other than the global geometrical transformation) is controlled. 
We give at each step of the analysis the main intuitions and several examples to illustrate the novel notions 
introduced in our analysis. 
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3.1 Analysis framework 

We first define a performance metric to measure tlie image registration accuracy. As we want to capture tlie 
performance of our registration algorithm witlr respect to tlie optimal image alignment obtained by solving 
(P'), a natural metric consists in computing the difference between the transformation invariant distance and 
its approximate version, i.e., E'{Ii, l2-,p,q) = \da{p-,q) — (i(/i,/2)|. We however assume in this paper that the 
images are given by their sparse expansions. Therefore, we use an alternative registration performance given by 
Ejij>,q) = da{p,q) — d{p,q), where we use the transformation invariant distance computed between the sparse 
image approximations p and q instead of the original images. Note that E{p, q) > since T]'''' C T. 

We relate in the following proposition the two registration metrics E{p,q) and E'{Ii,l2,p,q) to the sparse 
approximation errors ||Ji — p||2 and \\l2 — q\\2- 

Proposition 1. £"(p, q, /i, /a) < E{p,q) + \\Ii - p\\2 + Wh - qh- 
Proof. We have: 

E'ip,q,h,l2) = \daip,q)-d{h,l2)\ 

= \da{p, q) - d{p, q) + d{p, q) - /s)] 
<E{p,q) + \dip,q)^dih,l2)l 

using the triangle inequality. We now show that \d{p,q) — d{Ii,l2)\ < \\Ii — p\\2 + Wh — q\\2- Let rj & T- We 
have: 

\\U{v)Il - /2II2 = \\U{v){p + h-p) -{q + l2-q)h 

= \\Uiv)p - q + U{ll){h -P)- {h - q)h. 

Using the triangle inequality, we derive a lower and an upper bound as follows: 

\\u{v)p-qh - \\umh-p)h -\\h- qh < \\uir,)h - hh < \\uiv)p -qh + \\uil^)ih - p)\\2 + 11/2 - qh- 

As J7 is a unitary operator, we have \\U{ri){Ii — p)\\2 = \\Ii — p\\2- Hence, rewriting the previous equation, we 
get: 

\\Uiv)p -qh- \\h -Ph- Wh -qh< \\U{ri)h - hh < 1|C^('?)P - qh + \\h - ph + Wh - qh- (3) 

Recall that d{p,q) = min,,g7- ||J7(77)p — qh and d(/i,/2) ~ min^gx ||?7(?/)/i — /2||2- Hence, by taking the 
minimum over all 77 S T, we obtain |d(/i,/2) — d{p^q)\ < \\Ii — ph + 11^2 — qh^ which concludes the proof of 
the proposition. □ 

When most of the energy of Ii and I2 is captured by p and q (namely when — J5II2 + ||^2 — qh small), 
the registration errors E{p, q) and E'{Ii, I2,P, q) are equivalent. We suppose in the rest of this section that this 
condition is satisfied and we measure the registration error with E{p,q) = da{p,q) — d{p,q). Hence we focus 
exclusively in this analysis on the penalty induced by restricting the search space T to T^*'', that is the penalty 
induced by relaxing the problem (P) into the problem (P) in the above section. 

Before studying the registration performance, we describe additional assumptions on the discretisation of 
the transformation group T . Recall that the transformation 770 optimally aligns p and q in the sense in 
problem (P). We assume that it satisfies the following assumptions: 

%o7^erd,Vie{l,...,i^} (4) 
Vo'°S,eTdyie{l,...,K}, (5) 

where Td is the discretization of T used to construct dictionary V as given in Eq. ([1]). These hypotheses state 
that the atoms of U (770 )p and U ('7(7^)9 belong to the dictionary, where U {r]o)p is the optimal alignment oip with 
q and U{r]Q^)q is the optimal alignment of q with p. As 770 is obviously not known beforehand, it is difficult to 
verify this assumption in practice. However, we can assume that Eq. ^ and Eq. ([5]) hold when the parameter 
space used to design T> is discretized finely. 

Finally, the assumptions in our performance analysis can be summarised as follows: 

(Ai): ||/i-p!|2 + !|/2-9!|2«0, 
(A2) : riQ o 7i e 7d, 
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3.2 Registration performance with exact pattern transformation 

In our performance analysis, we first consider the special case where d{p, q) = 0. This means that there exists a 
transformation ryo G T for which q — U{'r]Q)p, i.e., the sparse image approximations can be aligned exactly. We 
show that in this case, our registration algorithm is able to recover the exact global transformation between p 
and q, as long as any subset of size 2K in V is linearly independent. We have the following proposition: 

Proposition 2. Suppose that any subset of size 2K in T) is linearly independent. In this case, if d(p,q) = 0, 
then E{p, q) ~ 0. 

Proof. If d{p, q) = 0, then we have J2iLi Ci4>rioo-yi — SiLi di4>Si = 0. Thanks to the linear independence of any 
subset of size 2K in T>, for any 7.^ there exists 5j such that (f'Tjoo-yi = 4'5j- Indeed, if this is not the case, we 
could write i'mon ^ linear combination of 2K — 1 atoms in T) that are all different from ipvoo-yi E^nd that all 
belong to T) thanks to assumption {A2). This contradicts the assumption that any subset of 2K atoms in T) 
is linearly independent. Then, since the mapping 7 1— >■ U{'y)(f> is one-to-one function thanks to our dictionary 
design assumption, we have 770 o 7,; = Sj. Thus, 770 = Sj o 7""'^ <= 72'''' and da{p, q) = miuj^gj-p ? \\U{ri)p — q\\^ = 
d(p,q) = 0. ° 

We can make the following remark about the design of the dictionary. The linear independence assumption 
guarantees that, when two A'-sparse signals arc equal, they have at least one atom in commoi|^. If this condition 
is violated, the patterns C/(ryo)p and q can have several decompositions in the dictionary with disjoint supports. 
In this case, all the features of the transformed pattern U{r]Q)p and q are distinct, which generally lead to 
da [p-, q) 7^ d{p, q) . Note that this assumption appears in many problems related to overcomplete dictionaries 
since it guarantees the uniqueness of isT-sparse decompositions [H |3l [18] . 

Finally, since Proposition [5] ensures that E{p, q) = for an exactly transformed pattern, and we have 
E'{Ii, l2,p,q) ~ E{p,q) when the sparse approximation errors are not too large (Assumption (Ai)), we can 
guarantee that the registration error E' (Ii, l2,p, q) is small in this case. 

3.3 Registration performance in the general case 
3.3.1 Bound on the registration error 

We now study the performance of our registration algorithm in the general case. The previous result only 
applies to an ideal scenario since the condition d{p, g) = is rarely satisfied in practice. There is usually some 
slight innovation between the images (other than a transformation in T), which result in a distance d{p,q) 
that is non-zero. In addition, even when the original images are exactly related by a global transformation 
(i.e., (i(/i,/2) = 0), there is no guarantee that the sparse approximations are can be perfectly aligned (i.e., 
dip, q) = 0) due to the discretization of the dictionary. 

We first define the innovation e > between the sparse image approximations as the difference that cannot 
be explained by a global geometric transformation in T. In more detail, when c and d denote respectively the 
coefficient vectors for patterns p and q following Eq. ^ , the innovation is defined as the smallest positive value 
e such that d{p, q) < e^||c||^ + \\d\\l. 

We now turn to the main result of our paper, which is formulated in Theorem [1] This result relates the 
error of the registration algorithm Algorithm [1] to the properties of the dictionary, namely the Robust Linear 
Independence (RLI) and the transformation inconsistency. It reads as follows. 

Theorem 1. If d{p,q) < eVlRli^HRlII w*^^ e > 0, then: 

E{p,q) < Q!pmin(||c||i, j|d||i) , 
when T> is {2K,e,a)-RLI for some a G [0,V2), and p is the transformation inconsistency of T> . 

Theorem [T] shows that robust linear independence with a small a and a small transformation inconsistency 
are key properties of the dictionary in order to guarantee the success of our algorithm. The RLI property can 
be thought as an extension of the linear independence assumption to the case where d{p, q) ^ Q. Specifically, 
it guarantees the existence of two approximately similar features in U{riQ)p and q when d{p,q) is small. The 
transformation inconsistency captures the fact that geometrical transformations have a different effect on distinct 
atoms in the dictionary. We defer the proof of Theorem [1] to Appendix |21 and we study in details in the rest 
of this section the novel RLI and transformation inconsistency properties. 

^The linear independence of any subset of size 2K in the dictionary actually guarantees a stronger result: it guarantees that 
any /S'-sparse signal has a unique decomposition in D [5]. In other words, it guarantees that when two X-sparse signals are equal, 
all the atoms are equal. 
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3.3.2 Robust linear independence 



We study now in more details the novel properties that permit to characterise the dictionary. We first show that 
the linear independence assumption introduced in Section is no longer sufficient to bound the registration 
performance in the case where d{p^ q) ^ (but close to zero). To see this, we construct a linearly independent 
dictionary 2? and two sparse patterns p and q for which d{p^ q) can be made arbitrarily close to zero (i.e., e — > 0) 
yet the registration error is large. As illustrated in Fig. [TJ we consider a dictionary V containing four square 
atoms and an additional big square atom parametrized by its position k with respect to (j)-y-^^ . Clearly, when 
K 7^ 0, the dictionary 2? is linearly independent since one cannot write an atom as a linear combination of the 
four other atoms. We consider the patterns P = ^^2^=1 'f'ti ^'^'^ 1 ~ 't'ls- "^^^ optimal transformation rjo in the 
example of Fig. [T]is a translation that exactly aligns p and q (i.e., U{t]q)p = q). However, rjQ does not satisfy the 
assumptions in Eq. ([5|). For small k, the transformation that best aligns the two sparse patterns and satisfies 
the assumptions in Eq. (0]), ([5]) is the identity. All relative transformations between features in p and q are 
however dilations composed with translations, which result in an estimated transformation t) in our algorithm 
that is significantly different than the identity. Hence we obtain a large registration error da{p,q) — d{p,q) in 
this example. This example shows that the linear independence assumption defined in Section is fragile: it 
does not allow us to bound the registration error even when d{p, q) is infinitcsimally small. One needs a stronger 
condition than mere linear independence to guarantee a small registration error. 
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Figure 1: Example where a linear independent dictionary T) induces a large registration error da{p, q) ~ d(j), g), 
when p = l/2((/)-),j + 4>^^ + (p-^^ + (p-^^) and q = (j)^^. Note that e (i.e., the innovation between p and q) in this 
particular example is equal to y^; by choosing small values of k, e can thus be made arbitrarily small. Note 
that this dictionary is linearly independent, but not robustly linearly independent for K = 5 (Definition [1} . 

We propose to extend the notion of linear independence into a novel property called robust linear indepen- 
dence (RLI) to characterise sets of vectors. It is formally defined as follows. 



Definition 1. Let {H,\\ ■ \\) he a normed space and K > 1. A family of vectors {vi, . . . ,vk) € is 
(e, a) -robustly linearly independent (RLI) if any set a G M.^ satisfies: 



K 



3i, J with ai, aj ^ 0, 



< a. 



(6) 
(7) 



In other words, when the innovation e and the parameter a are small, any linear combination of vectors that 
nearly vanishes in a RLI vector set contains at least two vectors that approximately cancel each other. The 
definition of robust linear independence can be extended to dictionaries as follows. 

Definition 2. A dictionary D is (K, e, a) -RLI if any subset of size K in T> is (e, a) -RLI. 

In our performance analysis, we consider cases where innovation is small. Hence, we examine the behavior 
of a when e is chosen to be small. In this particular case, we have the following defi nition. 

Definition 3. A dictionary T> is K -RLI if it is {K,e,a)-RLI, where a tends to zero when e approaches zero. 

As an illustration, the dictionary in the previous example of Fig. [T]is not K-KLl for K = 5. Indeed, by 
choosing a vector of coefficients a = [0.5,0.5,0.5,0.5,-1]^, we obtain |Ei'^j"j|l2 ~ a = 1. Note that 

the RLI property on the dictionary has to be satisfied in order to obtain good registration performance, as it 
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ensures the existence of two approximately similar features (in the sense) in U{riQ)p and q, when d(ji,q) is 
small. 

We study now in more details the RLI property. In particular, we examine the main difference between 
RLI and the well known Restricted Isometry Property (RIP) [3]. The restricted isometry condition assumes 
that a collection of vectors behaves almost like an orthonormal system but only for sparse linear combinations. 
Specifically, any linear combination of K elements (where K <^ |I?|) in the dictionary satisfies: 



(i-<5a-)!I«II2< 



K 



<{l + SK)\\a\\l 



J2a, 

where the parameter Sk takes a small positive value. By imposing a RIP property on the dictionary V with Sk <C 
1 , the norm of any sparse linear combination of atoms is guaranteed to be large (i.e. , larger than -^(1 — (5A-)|ia||2) . 
In our case, contrarily to the RIP, we are interested in linear combinations of atoms that nearly vanish. The 
RLI property imposes in this case the existence of two atoms in the support that approximately cancel each 
other. Consequently, RLI can be seen as a weak form of RIP, where we allow the norm of linear combinations to 
be close to zero provided that the condition in Eq. ([7]) is satisfied. In particular, any dictionary 2? that satisfies 
the RIP property with a parameter 6k will be {K, y/1 — Sk, 0)-RLI. Indeed, since ^i^i — \/l~~"^l|fl||2 

holds for any subset of K dictionary elements, the condition in Eq. ^ is never satisfied (when e = \/l — 5k)- 
Let us consider a simple example to compare the new RLI property with the common ways of characterising 
dictionaries, namely, the coherence |18| and the restricted isometry property [3]. 



Example 1 (Dictionary of translated box functions) . Let H ~ (M) and define the box functions 

v{t) 



1, */tG[0,l] 
0, otherwise. 



We consider the infinite-size dictionary 'Dhox — {TtV = Vr '■ t £ M}, where is the translation operator by r. 
The dictionary has the following properties: 



Vhox is RIP with a constant 5K{T^box) equal to 1, for any K > 2. 
The coherence ofT>i,ox is equal to 1. 



• Vbox is K, e, 1(4^'' - 1) -RLI for K >1 and e e 



The RIP constant 6k (Vhox) is 1 for any X > 2 as two box functions can be made arbitrarily close to each 
other. Similarly, the coherence of this dictionary is equal to 1. Nevertheless, the dictionary 2?box is K-KLl for 
any K > I, since a goes to zero when e tends to zero. One can understand the fact that 2?box is RLI intuitively; 
if a linear combination of box functions have a small norm, there exist at least two box functions that nearly 



cancel each other. Note that a = ey^{4^ — 1) has a strong dependence on K. In other words, when a is 
fixed, e can be made arbitrarily small by increasing the number of atoms K . As the proof of the robust linear 
independence of I'box is rather technical and not essential to the main understanding of the paper, it is given 
in Appendix |D] 

Even if the dictionary I?box hardly satisfies the RIP and is highly coherent, it is still an interesting one in 
our framework. Indeed, it satisfies the key property that two sparse signals that are close in the L^ sense have 
at least two approximately similar features. When applied to our registration problem, this guarantees the 
existence of two features related approximately by a transformation rjQ in the L^ sens(|f| when d{p, q) remains 
small. This property is at the core of our registration algorithm since we infer the global transformation by 
looking at the local transformations between the features. 

3.3.3 Transformation inconsistency 

The second dictionary property that is important to study the performance of our algorithm is the transformation 
inconsistency, which measures the difference in the effect of the same transformation on distinct atoms in the 
dictionary. It is formally defined as follows for parametric dictionaries given by Eq. (fT]l. 



^More precisely, this means that there exists a fi and a Sj such that \\U {r]o)4''Yi ~ 4>s- II2 is small. 
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Definition 4. The transformation inconsistency p of a parametric dictionary V is equal to: 

p = sup sup <^ 2 — inr r ' 

where I is the identity transformation. The transformation inconsistency p is always larger than or equal to 
1. Furthermore, when T is abelian, the transformation inconsistency takes it minimal value and is equal to 1. 
Indeed, for any 7, 7' in Td and 77 € T, we have; 

||£/(77)0y ~ 0y II2 ^ WUjl'Mr, - 0)||2 ^ 110, - 0II2 ^ 
||[/(77)0^- 0^112 ||C/(7)(0,-0)||2 ||0,-0i|2 ■ 

Hence, taking the supremum over all 77 g 7~ and atoms 7,7' in Td results in having p = 1. This is expected 
since when T is abelian, a fixed transformation acts on all atoms similarly. 

On the other hand, a large value of the transformation inconsistency p (i.e., p ^ 1) means that there 
exist two atoms in the dictionary that arc affected in a very different way when they are subject to the same 
transformation. The transformation inconsistency plays a key role in our registration algorithm. Indeed, as 
the global transformation between two sparse patterns is estimated from one of the relative transformations 
between features, it is preferable that transformations act in a similar way on all the features of the sparse 
patterns for more consistent registration. That means that dictionaries with small transformation inconsistency 
provide better registration performance. 

In order to outline the importance of this novel property in our registration framework, we give a few 
illustrative examples of dictionaries with different transformation inconsistency parameters. 



Example 2 (Dictionary with quasi isotropic mother function, T = SE{2)). We consider T to be the Special 
Euclidean group (T = SE{2) ). That is, T accounts for translations, rotations and combinations of those. We 
consider an ellipse-shaped mother function (p as shown in Figure[^ (a) with anisotropy r = j-. Then, we suppose 
for the sake of simplicity that Td = T (i.e., the dictionary is built by applying all transformations "f T to the 
generating function (f>). 

We illustrate in Fig\^ (b) the effect of transformation rj, which is a simple rotation, on two different atoms 
with parameters 7 and 7' positioned at different points in the 2D plane. While the rotation of the atom 
parametrized by 7 induces a very slight change on it (when r fa 1), the same rotation applied on the atom 
(pj' changes completely its position. Hence, the transformation rj has a very different impact on atoms <j)^ and 
(jiryi , and we get p — 00 from Definition [7} Therefore, when the generating function (j) approaches isotropy, the 
transformation inconsistency grows to infinity. 

In this example, our registration algorithm is not guaranteed to have a small error. To illustrate it, let us 
consider the patterns p and q illustrated in Fig. \B (c), which are each composed of two atoms whose coefficients 
are all equal. The distance d{p, q) between the pattern can be made arbitrarily small with a generating function 
that is close to isotropic (i.e., r ^ I) while the minimal distance da{p,q) in our algorithm remains large. 
Indeed, since our algorithm considers only relative transformations between pairs of atoms, the estimated global 
transformation between the patterns can only be equal to a combination of a translation and rotation of ^. 
However, when r « 1, the optimal transformation is clearly the identity, which cannot be selected with our 
algorithm: this results in a large registration error da{p,q) ~ d{p,q). Note that the error here is entirely related 
to the fact that the transformation inconsistency p is large, and not to the RLI property since the dictionary 
under consideration here is robustly linearly independent for small values of the sparsity K . 



Example 3 (Dictionary built on an elongated mother function, T = SE[2)). Similarly to the previous example, 
we consider the transformation group T ~ SE{2) and the fact that Td = T. However, the dictionary is now 
built on an elongated mother function as shown in Fig\^ (a). As in the previous example, we can make the 
transformation inconsistency p very large by taking elongated atoms (large L ) and a transformation rj that is 
a small translation, as shown in Fig\^ (b). It is again possible to construct an example where the registration 
algorithm performs poorly (see Fig\^ (c)) : the set TJ'' of transformations between features in each sparse 
pattern contains only translations and rotations of ^ . Therefore, any candidate transformation rj G Tf''^ results 
in a large value of the global registration error term \\U{'q)p — q\\2; the optimal global transformation is the 
identity in this case, which leads to a small value of the minimal distance d(p, q) between the patterns when L 
is large. 

To be complete, we should note that the one-to-one mapping assumption defined in Section 12.21 for the 
function 7 t-^ 11(^)41 is not satisfied in Examples [5] and [21 since (j) has a rotational symmetry of tt. In this case, 
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Figure 2: Example of a dictionary where the transformation inconsistency p is large, (a): Mother function 
of the dictionary with anisotropy f = j^- (b): Atoms 0^', and a transformation rj that leads to a large 
transformation inconsistency p . (c): Examples of patterns p (atoms represented with solid line) and q (atoms 
represented with dashed line) where our algorithm has a large registration error da {p, q) — d{p, q) . 



L 



^:::":::::::::::::":::} 
P 

q 



[a] (&) (c) 

Figure 3: Example of a dictionary where the transformation inconsistency p is large, (a): Mother function of 
the dictionary, where L is the length of the atom, (b): Atoms (j)^, 0^/, along with the results of a transformation 
rj that causes the transformation inconsistency p to be large, (c): Examples of patterns p (atoms represented 
with solid line) and q (atoms represented with dashed line) where our algorithm has a large registration error 
daip, q) - d{p, q). 



a slightly more complicated definition of the transformation inconsistency p has to be made to avoid having 
p = oo (with the definition of p given in Definitional we obtain p = cx) by setting 77 to be a rotation of tt, 7 to 
be the identity and choosing any 7' different from 7). The main intuitions of the transformation inconsistency 
p, as defined in Definition 0] however holds when has a finite number of symmetries. We study in detail the 
generalization of the transformation inconsistency p to the case where (f) has symmetries in T in Appendix IbI 

Example 4 (Dictionary built in translation and isotropic dilations, T = 7d = K'^ x IR+). Finally, in this 
third example, we let T to be the group of translations and isotropic dilations. The generating function of the 
dictionary could have any form, as long as its support is much smaller than the dimension of the image. For 
example, we can choose a circle-shaped mother function, as depicted in Fig^ (a). Then, we consider the scenario 
where two atoms (jj-y and (jj-yi that are separated by z (where z is considered to be very large) as illustrated in 
Fig^ (b). A transformation rj that consists of a small isotropic dilation has a very different effect on both atoms. 
In particular, the transformation rj applied to (f)^ results in an atom that has no intersection with (jj-yi , while the 
same transformation has almost no effect on (f>^, i.e., U{rj)(j)j w (f>^. Thus, the transformation inconsistency is 
very high and p ~ 00 according to Definition^ In Fig. [7] (b), we illustrate why this may cause a problem in our 
registration algorithm: we consider the two sparse patterns p and q composed of two features each, where the 
coefficients of all the atoms are equal. It is not hard to see that when rj is a small dilation, the optimal global 
transformation between both patterns is the identity. At the same time, our algorithm can only estimate a global 
transformation that is a dilation (combined possibly with a translation) since all transformations between pairs 
of atoms in p and q consist in combinations of dilation and translation. 

Overall, the above examples suggest that, whenever the transformation inconsistency of the dictionary is 
large, one may construct an example where our registration algorithm approximates poorly the transformation 
invariant distance. In the general setting where T is any transformation group (and Td — T for the sake of 
simplicity), such example of failure could be constructed as follows. The basic idea is to build two patterns 
p and q of the form p = (f>-y + (j)yr and q = U{rji)4).y + U{Tj2)4>-y' for which: (i) p fa q, (ii) \\U{rji)p — q\\2 and 
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Figure 4: Example of a dictionary where the transformation inconsistency p is large, (a): Mother function of 
the dictionary (b): Atoms c/)^, 0^/, and transformation rj that causes p to be large, (c): Examples of patterns p 
(atoms represented with solid line) and q (atoms represented with dashed line) where our algorithm has a large 
registration error da{p, q) — d{p, q). 

\\U{ri2)p — q\\2 are large (with respect to \\p — q\\2)- The optimal transformation between p and q is then simply 
the identity, whereas the transformations tested in our algorithm (namely ryi and 772, along with ryi o 7 o 
and 772 o 7' o 7~^) result in a poor registration performance as they all differ from the identity transformation. 

In more details, when p 3> 1 we know that there exist two atoms 4>^ and with 7 G T and 7' G 7", 
along with a transformation -qi for which \\U{'qi)<f)^ — <j)^\\2 ~ while ||J7(ryi)(/)^' — </)7'||2 is large. By posing 
m = {i °l^^)°m°{l°{iy^), we get that \\U{r]2)(t>j' -(j^^'h = ||C/(??i)0T--07ll2 ~ 0. Hence, the norm |b-g||2 
is necessarily smaU since Ijp- g||2 = \\4>i + (l>i' ~U{rii)<f>~, - ?7(772)07' II2 < 11^7 - C^('7i)<?!'7 II2 + \\4>i' - ^^('72)07' lb- 
Besides, we know by construction that \\U{'qi)(j)yi — (/)7'||2 is large and \\U{'q2)(t)-y — <t>-y\\2 is also generally large 
since the group T is non abclian. This gives us, in general, large values of \\U{rii)p — q\\2 and \\U(rj2)p — q\\2- 
This construction shows that, when the dictionary has a large inconsistency parameter, one can find patterns 
for which the registration algorithm fail to recover the right global transformation. 

In general, the above examples show that it is better to choose a dictionary with a small transformation 
inconsistency (i.e., p small) to have good registration performance irrespectively of the patterns to be aligned. 

The performance of the registration algorithm depends on the transformation inconsistency as well as on the 
robust linear independence of the dictionary, as shown in Theorem [TJ The success of our registration algorithm 
for all sparse signals in the dictionary is guaranteed when the RLI and transformation inconsistency conditions 
are satisfied. Note that the conditions on the dictionary properties are essentially tight, as one can construct an 
example where our algorithm fails whenever one of the parameters is large enough. However, the performance 
bound should be interpreted more in a qualitative way than a quantitative way. It provides two rather intuitive 
conditions for our algorithm to provide low registration error. In order to use this bound quantitatively, one 
has however to be able to compute explicitly the newly defined properties on generic dictionaries. We outline 
here the fact that such a bound could not have been established with traditional measures for characterising 
dictionaries, namely coherence or restricted isometry property constant. Finally, we remark that the result in 
Theorem [T] can be used to bound the registration error E'{Ii, I2,P, q) thanks to Proposition [51 The price to pay 
in this case is the approximation error — PII2 + 11^2 — q\\2- 

4 Image registration experiments 

In this section, we evaluate the performance of our algorithm in image registration experiments. We first 
describe the implementation choices in our registration algorithm. Then we study its performance for different 
dictionaries and put the results in perspective with the theoretical guarantees in Section [31 Then, we present 
illustrative image registration experiments with simple test images and classification test with handwritten 
digits. Finally, we provide some simple comparisons with baseline registration algorithms with simple features 
from the computer vision literature. 

4.1 Algorithm implementation 

In all the following experiments, we focus on achieving invariance to translation, rotation and scaling. Invariance 
to these transformations is indeed considered to be a minimal requirement in invariant image pattern recognition. 
These three operations generate the group of similarities that we denote by T = SIM {2). Any element in T is 
therefore indexed by 4 parameters: a translation vector h = {bx,by), dilation a and rotation parameter 6. We 
describe now the sparse approximation algorithm and the dictionary design used in our experiments. 

4.1.1 Sparse approximation algorithm 

There are many methods to construct sparse approximations of images. In our experiments, we use a modified 
implementation of the Matching Pursuit (MP) [T3] algorithm, as MP is a pretty simple algorithm that works 
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relatively well in practice. It is an iterative algorithm that successively identifies the atoms in T) that best match 
the image to be approximated. More precisely, MP iteratively computes the correlation between the atoms in 
T) and the signal residual, which is obtained by subtracting the contributions of the previously chosen atoms 
from the original image. At each iteration, the atom with the highest correlation is selected and the residual 
signal is updated. While the standard MP algorithm solves the sparse approximation problem without positivity 
constraint on the coefficients, we propose a slightly modified algorithm (that we call Non negative Matching 
Pursuit (NMP)) in order to select atoms that have the highest positive correlation with the residual signal. 
This choice is driven by the objective of having a part-based signal expansion, where each feature participate 
to constructing the signal representation. The NMP algorithm is formally defined in Algorithm [2j 

Algorithm 2 Non negative Matching Pursuit (NMP) for feature extraction 
Input: image /, sparsity K , dictionary T). 
Ensure: coefficients c, support F. 

1. Initialization of the residual: tq ^ I and support: F ^ 0. 

2. While l<i<K,do: 

2.1 Selection step: 

7i ^ argmax(ri_i,(/)^) 
-yeTd 

F^ru{7j. 

2.2 If (r,_i,<^^,) < 0, go to 3. 

2.3 Update step: 

Ci (ri_i, 

3. Return c, F. 



One way to choose the sparsity K consists in controlling the approximation error of Ii and l2- Specifically, 
we can impose a stopping criterion in the NMP algorithm of the form IITKII2 < e where tk is the residual at 
iteration K and e is a fixed threshold controlling the approximation error. When e is chosen to be small enough, 
this guarantees a relatively small sparse approximation error. 

Note that the complexity of NMP is governed by the selection step, hence 0(i4r|I?|) operations need to be 
performed. Besides, the complexity of solving (P) using Algorithm [T] is 0{K^N) with N = max(7Vi,7V2) with 
iVi and N2 respectively the dimensions of the discretized images corresponding to p and q. Therefore, if the 
sparse approximation step is necessary for registration, the complexity of the overall registration algorithm is 
thus 0{K\'D\ + K'^N). Depending on the factor -j^, the complexity might be governed by either step of the 
algorithm. Overall, the choice of K results from a trade-off between approximation error (hence registration 
performance) and computational complexity. Finally, note that in applications involving the registration of a 
test image with possibly many training images, the sparse approximations of the training images are computed 
offiine. Hence, only the sparse approximation of the test image needs to be computed during the test phase. 

4.1.2 Choice of the dictionary 

We discuss now the choice of the dictionary T) that is used in our experiments. As pointed out in Eq. ((Ij, the 
dictionary T) is simply constructed by applying geometric transformations 7 G 7d to a mother function (p. We 
thus need to choose appropriately the mother function cf) as well as the discretisation for constructing the subset 
Td of T. In the light of the derived analytical results, ideally we would like to design a dictionary that satisfies 
the following constraints: 

• Images should have a good sparse approximation in the dictionary (assumption [Ai] of the analysis). 

• The dictionary should be robustly linearly independent. (Theorem [1} . 

• The transformation inconsistency parameter of the dictionary should not be too large (Theorem [1]). 

We propose to use an anisotropic Gaussian generating function as it has been shown to provide good 
approximation results in natural images [7]. It is defined as follows: 
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where v > I controls the anisotropy and the normalization factor ^ is chosen to have ||0||2 = fl The choice of 
v K 1 results in an isotropic mother function that causes the transformation inconsistency p to be very large 
(see Example [2]). The transformation inconsistency is also large when the value of ly is chosen to be large (see 
Example [3|). In our experiments, we have generally chosen an intermediate value = 4 as a compromise between 
the two extreme values. 

The dictionary 2? is built by transforming the generating function </> with all transformations in Td, where 7d 
is a discretization of the transformation group T. In our experiments, we consider the following discretization: 

• The translation parameters can take any positive integer value smaller than the image dimension. 

• The rotation angles are uniformly discretized in [0, tt) with a step size of ^. We have seen experimentally 
that this step size results in a good directional accuracy. A denser discretization comes at the expense of 
higher computational cost. 

• The scaling parameters are sampled uniformly on a logarithmic scale with a step size of half an octave. 
This step size results in a compromise between the sparse approximation error and an oversampling of the 
scale space that might lead to wrong registration (and a too high computational complexity). We set the 
minimum scale to one, and the maximum scale is designed to have 99% of the energy of a centred atom 
inside the image domain. 

Fig. [5] illustrates several examples of parts-based representations obtained NMP and a dictionary of Gaussian 
atoms, as described above. We observe that the part-based decomposition manages to approximate well the main 
geometric characteristics of the image. Furthermore, the same features are used in the different approximations, 
up to some geometrical transformation that corresponds to the relative transformation between the different 
versions of the original image. This is exactly the property that is at the core of our registration algorithm. 



(a) (b) (c) 



(d) (e) (f) 

Figure 5: Sparse approximations of transformed versions of the 'Car' image, computed with NMP and a 
dictionary constructed from a Gaussian generating function, with — A. The first row shows the original 
images, size 75 x 75 pixels. The second row shows the corresponding sparse approximations with a sparsity of 
K = 15 atoms. 



4.1.3 Registration refinement 

Our registration algorithm estimates a transformation in the set 72"'*^ C {7 o : 7,(5 G Td}, where Td is the 
chosen discretization of the parameter space T- In order to reduce the registration error that is due to the 
discretization of the dictionary, we have chosen in the experiments to extend our registration algorithm with a 
gradient descent technique that refines the estimated transformation. Hence, even if the optimal transformation 
r]o is not located on the lattice formed by the discretisation of the transformation parameter space, the additional 
local optimization step allows to converge to the optimal transformation if it lies close to the estimation computed 
by our registration algorithm. 

Specifically, the problem consists in minimizing the objective function J(ri) = \\U{ii)p — gjjj, where the 
unknown transformation rj is constrained to be in T- We present in Appendix [C] a local optimization technique 

^Formally, the Gaussian mother function does not satisfy the one-to-one mapping assumption of 7 1— > U{'y)<f). We circumvent this 
by slightly modifying the definition of TX''' ■ We define the stabilizer of (p to be the set that keeps the mother function unchanged: 
iS^ = {7 : U{-f)4> = 4>}- Then, wc define Tj'"' = {<5i o vr o (7j)~^ : 1 < i,j < K,tt S^f,}. For more details, refer to Appendix [B] 
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that respects the intrinsic geometry of our problem and formulate the induction of the gradient descent step. 
In particular, we follow the same approach as Jacques, et. al. in }10j . 

4.2 Influence of the dictionary on the registration performance 




v = l c = 1.5 !/=2 1/ = 2.5 1^ = 3 



Figure 6: Gaussian mother functions with different values of the anisotropy v. 

In a first set of experiments, we examine the influence of the dictionary choice on the registration pcr- 
formanc(jl. We fix here the transformation group T to be the special Euclidean group SE{2) (containing 
translations and rotations). We consider that the dictionary mother function is a 2D anisotropic Gaussian func- 
tion. We vary the anisotropy parameter v of the mother function to generate a class of different dictionaries. 
Several generating functions obtained by varying the anisotropy parameter v are illustrated in Fig. [B] Note 
that the discretisation of the parameter space Td is kept fixed for all dictionaries. We study now the registration 
performance of each of these dictionaries. 




(a) 7i (b) l2 



Figure 7: Original images used in the first experiment. 

In a first experiment, we test our registration algorithm with the images Ii and I2 illustrated in Fig. [7]for our 
class of dictionaries. We first represent in Fig. |S]the mean sparse approximation error \ — p||2 + II/2 — 9II2) 
for decompositions with K = i atoms, when the anisotropy parameter u in the dictionary mother function varies. 
For the same class of dictionaries, we also measure the registration performance E' = 1 1| J7 {rio)Ii — /2 j| 2 — \\U — 
where rjo and fj are respectively the optimal transformation (namely a rotation of 7r/4), and the estimated 
transformation. The registration performance is also illustrated in Fig. |51 One can see clearly that the sparse 
approximation error is increasing with the anisotropy of the mother function. Indeed, when the mother function 
approaches isotropy, the dictionary approximates well the tennis balls in images Ji and l2- The registration 
performance has however an opposite behaviour: the error decreases with increasing values of the anisotropy. 
This suggests that the sparse approximation error is not the only quantity controlling the performance of the 
registration algorithm, as predicted by our theoretical performance analysis. Indeed, using the same arguments 
as in Example we know that the transformation inconsistency parameter goes to infinity when the mother 
function is isotropic: this explains the poor registration performance for generating functions that are close to 
isotropic. 

As the transformation inconsistency parameter looks crucial in the registration performance, we estimate 
its value for the same class of dictionaries. This estimation is performed by applying the definition of the 
transformation inconsistenc}lf|, where the infinite set T is finely discretized. In a final step of the estimation, 
the transformation rj E T that maximizes the transformation inconsistency is refined with a local gradient 
descent search. Fig. |9] shows the estimated value of transformation inconsistency parameter with respect to 
the anisotropy of the generating function. One can see that the evolution of the transformation inconsistency 
parameter is consistent with the theoretical analysis in Section [3] For near- isotropic atoms, the parameter p is 
large (Example [2]). Similarly, when v is large, the transformation inconsistency increases as shown in Example 
[21 Even though our estimation of the transformation inconsistency may not be perfectly accurate (due to the 
discretization of T), it confirms the tendencies described earlier in the theoretical analysis. It further contributes 
to explaining the trade-off between approximation and registration error that has been illustrated in Fig. [51 

^In this set of experiments, we apply our registration algorithm without the gradient descent refinement. We do so in order to 
focus exclusively on the performance of Algorithm [T] in terms of the considered dictionary. 

^Note that we applied the definition in Eq. II24I I since these atoms have a rotational symmetry of tt. 



14 



0.8 1 1 1 1 1 1 1 1 1 1 .5 




0.2 1 1 1 1 1 1 1 1 ^0 

1 1.5 2 2.5 3 3.5 4 4.5 5 

Anisolropy v 

Figure 8: Approximation error (solid) and registration error (dashed) for images in Fig. [7] as a function of the 
anisotropy of the dictionary generating function. The sparsity K is fixed to 3. 
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Figure 9: Estimation of the transformation inconsistency parameter for dictionaries buih on Gaussian mother 
functions with different anisotropy v. 

We study now a second experiment where we consider that the transformation group is T = x R+ . That 
is, T contains transformations that can be written as combinations of translation and isotropic dilation. We 
construct another class of dictionaries by fixing the generating function to be an isotropic Gaussian (as shown 
in Fig. El = 1) but we vary the step size that is used for the discretisation of the dilation parameter. More 
precisely, the set of transformations 7d that is used to build the dictionary, is constructed from T by imposing a 
fixed uniform discretization of the translation parameter and a uniform discretization of the dilation parameter 
whose step size can take different values. Note that the minimum and maximum scales are kept fixed in 
all dictionaries and only the space between two consecutive scale parameters is varied. We finally measure 
the sparse approximation performance with K = 3, as well as the registration accuracy that can be obtained 
with this second class of dictionaries for the images Ii and I2 shown in Fig. 1101 Both sparse approximation 
and registration errors are computed similarly to the previous experiment. They are illustrated in Fig. [11] as a 
function of the different values of the scale step size Ag . 




(a) Original image li (b) Transformed image I2 



Figure 10: Original images used in the second experiment. 
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Figure 11: Approximation error (solid) and registration error (dashed) for images in Fig[TU]as a function of the 
scale stepsize used for constructing the dictionary. The sparsity K is fixed to 3. 

We observe in Fig. [11] that the sparse approximation and registration errors have opposite behaviours with 
respect to the scale space discretization. This is in-line with our observations on the first experiment above. 
Indeed, a fine discretisation leads to a small approximation error. At the same time, the registration is less 
accurate when the discretisation is fine. Conversely, coarser discretisation of the scale parameter results in less 
compact dictionary, hence in larger approximation errors, but better registration performance. These tendencies 
can be explained using the arguments developed in Example |H 

In summary, these two experiments show that constructing a dictionary that guarantees a small approxi- 
mation error of the images is not enough to have a low registration error. As we have seen earlier in Section |31 
crucial parameters such as robust linear independence and transformation inconsistency have to be taken into 
account in the design of the dictionary in order to reach good registration performance. 

4.3 Illustrative examples 

We propose in this section some illustrative experiments that study the performance of our registration algorithm 
for determining the relative transformation between pairs of images, or for image classification. We further 
compare the properties of our registration algorithm to other baseline solutions for computing transformation 
invariant distances. 




(a) Duck (b) Car (c) Bear 



Figure 12: Test images [9]. All images are resized to be of dimension 75 x 75 pixels. 

In our first experiments, we consider the test images shown in Fig. [121 which have been collected from the 
ALOI dataset [3]. We generate 100 random transformations and apply them to the test images. Each of the 
transformation belongs to T and consists in a combination of translation, rotation and isotropic scaling. Both 
components of the translation vector are smaller than half the image size and the isotropic scaling parameter 
is constrained to be in [0.5, 1.5]. These restrictions guarantee that most of the image energy lies in the image 
space, possibly with some occlusions. We put no specific restrictions on the rotation angle. Fig. [T3l illustrates 
some examples of transformed images. 
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Figure 13: Sample set of test images built by applying random geometric transformations to the Duck image. 

We first examine the accuracy of our algorithm in estimating the correct global transformation between 
pairs of images. We register each of the transformed test images with the original image and compute the 
average registration accuracy over 100 such operations. Fig. [14] shows the average error in the translation, 
scaling and rotation parameters when registering pairs of 'Duck' images for different number of features K in 
the sparse image approximations. We see that for K > 10 our algorithm determines a very good approximation 
fj = {b,a,6) of the optimal transformation ijq = (^qjAqj^q). That is, we have in average a translation error of 
approximately 1 pixel, a scaling error of 0.02 and an angle error of 10 degrees. 
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(c) Rotation error: min{|0 - 180 ~ \9 - 6»^|) 



Figure 14: Errors in translation, scaling and rotation (in degrees) versus the sparsity K in the approximation of 
'Duck' images. The parameter of the optimal transformation is denoted with tj'q = (5q, Oq, 9q) and the estimated 
transformation with ij = (6, a, 0). The results are averaged over 100 tests. 

We compare now our method with several baseline algorithms for computing distances that are invariant 
to transformations. The first of these methods is based on the tangent distance |15] that approximates the 
transformation invariant distance between the images with the distance between two linear subspaces that can 
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be easily computed. Specifically, the authors in |15] approximate the distance d{Ii,l2) with: 



(^1,^2) 



mm 

/;eT(/i),/^eT(72) 



I2U 



where T{Ii) and T{l2) are the tangent planes to the manifold of transformed images of Ji and I2 respectively, 
evaluated at Ii and l2- The equations of T{Ii) and T{l2) can be explicitly computed and the original problem 
of computing the transformation invariant distance reduces to solving a least squares problem |15j . We also 
compare our method with an approach that solves the original problem {P') using a simple gradient descent 
technique starting from the identity transformation. Finally, the last comparative scheme is simply based on the 
computation of the regular Euclidean distance between the images Ii and I2 ■ Note that in all three competitor 
solutions, the distances are computed directly on the original images, whereas, in our approach we use only the 
sparse image approximations to compute the distance. We choose to do so since our aim here is to show that 
our method can be used without explicitly using the complete images in the transformation estimation: a good 
sparse approximation is indeed sufficient to obtain accurate registration results. 





(a) Euclidean distance 



(b) Tangent distance 
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(c) Gradient descent 



(d) Our method {K = 10) 



Figure 15: Average and standard deviation of intra- and inter-class distances for different methods. The blue 
color denotes intra-class distance while the green and red colors refer to the distance between Duck-Car and 
Duck-Bear images respectively. The distance has been computed between one reference image ('Duck') and 100 
randomly generated transformed images in each class. The intra-class distance should be ideally at zero. 



We extend the previous experiments towards classification of images. In particular, we compare the 
transformation-invariant distance for images of the same class to the same distance computed between im- 
ages of different classes. Ideally, the first one (the intra-class distance) should be smaller than the latter one 
(the inter-class distance) in order to obtain good classification performance. We start with a simple scenario 
where the reference image is chosen to be the 'Duck' image in Fig. [T^l We then compute the transformation 
invariant distance between the reference image and the transformed versions of images in the same class ('Duck' 
), and in the other classes ('Car' and 'Bear'). Fig. [TSl shows the average of the transformation invariant distances 
computed with the different methods. One can see that the euclidean distance between images of the same class 
is not significantly different from the distance between images of different classes. The tangent distance docs 
not improve the performance since this method provides only local invariancc to transformations. Similarly, the 
gradient descent approach converges to the correct transformation only when it is close enough to the initial 
transformation. As this happens rarely, this approach docs not provide results that are significantly different for 
intra- and inter-class comparisons. In our method however, one can see that the intra-class distance is signifi- 
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cantly smaller than the inter-class distance. Fig. further shows the evolution of the transformation-invariant 
distance with respect to the sparsity of the images. We see that the intra-class distance is always smaller than 
any of the inter-class distances in our algorithm, even for very small values of the sparsity K. This provides a 
confirmation that salient geometric features in sparse images are crucial for proper registration. Hence, without 
having a very accurate sparse representation of the patterns, our registration algorithm succeeds in having an 
approximation of the distances that allows at least to classify the simple patterns under test. Note that this 
observation does not contradict the worst case theoretical analysis in which we assume that the sparse approx- 
imation error is small. We observe in practice that, even when this assumption does not hold, one can still 
obtain a good registration accuracy that is sufficient for the classification of simple signals. Finally, we note 
that the results are essentially the same if we repeat the same experiments with a different reference image in 
our dataset. Overall, our illustrative experiments so far show that, with a coarse approximation of the original 
images in the dictionary, our approach succeeds in obtaining an accurate estimation of the transformation, and 
the computation of the distances show that the intra- and inter-class images are well distinguished. This is 
an interesting property towards the development of registration algorithms in applications where access to the 
original (high quality) images is not possible. 
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Figure 16: Evolution of the intra-class and inter-class transformation invariant distance in the proposed algo- 
rithm as a function of the number of features in the sparse images. 

We extend the simple classification experiments proposed above and study now the performance of our 
registration method in a more challenging task of transformation-invariant handwritten digit classification. We 
use the digits '0' to '5' from the standard MNIST database of handwritten digits [T]. We construct the training 
data by randomly choosing 100 images for each digit, which results in 600 training images. The test data 
is constructed similarly: 100 images are taken in each class in order to generate 600 test images. Note that 
the test data does not contain any of the training images. Finally, we apply to each test image a random 
transformation built on translation, rotation and isotropic scaling. Our classifier then works as follows: each 
test data is assigned the label of the digit in the training set that best aligns with it, or equivalently that 
minimizes the transformation-invariant distance to the test image. In other words, the label of a test image 
is chosen to be the label of its nearest neighbour in the training dataset, up to a geometrical transformation. 
We compare the classification results when the transformation-invariant distance is computed with the different 
methods proposed above. The classification results are shown in Tabic [21 





Classification accuracy 


Euclidean distance 


14 % 


Tangent distance 


33 % 


Gradient descent 


62 % 


Proposed registration algorithm [K = 10) 


86 % 



Table 2: Handwritten digits classification accuracy for different approaches in computing transformation- 
invariant distances. 

One can see that using the Euclidean distance on the transformed test images results in a very poor classifier, 
whose performance is actually close to the one of a random classifier. Using the tangent distance results in some 
improvement, but it is still far away from the desired performance. This is due to the fact that the tangent 
distance is appropriate only for local transformations, while the transformations that we consider are generally 
of large magnitude. Similarly, the gradient descent approach does not perform well, since it is only guaranteed 
to reach a local minima. Using our registration method however, we achieve a relatively high classification rate, 
which is by far the best performance among the compared methods. It is worth noting that the performance 
of our algorithm (86% of classification accuracy) is only slightly worse than the performance of a Euclidean 
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nearest neighbour classifier with ahgned images (i.e., no transformations are applied on the test data), which 
reaches a classification accuracy of 94%. The latter classifier provides an upper-bound on the performance we 
could achieve in our settings where test images are transformed. 

Finally, note that existing methods in the literature achieve close to zero error rate on the MNIST database 
[T]. However, unlike the proposed approach, these methods generally do not support invariance to large transfor- 
mations. Furthermore, our method is general in the sense that it is not specific to handwritten digit classification 
and can be used in any application involving image alignment. 

4.4 Relation to feature-based methods 

The proposed registration method shares several similarities with feature based approaches in the computer 
vision literature. In such methods, we represent an image using a set of local features [keypoints) along with 
high dimensional descriptors that describe the local behaviour of the image around the keypoints. In order to 
register accurately two images using a feature-based approach, the following two conditions must be met: 

• Keypoints covariance to transformations: The keypoints undergo the same transformation as the original 



• Descriptor invariance to transformations: The descriptors are oblivious to the transformation of the 
original image. 

Since these conditions are ideal and hard to satisfy in practice, inaccuracies generally happen in keypoint 
locations and matching. To account for these issues, the registration process first excludes outlier keypoints 
(i.e., the keypoints that are not consistent most of the other keypoints). This is usually performed with the 
RANSAC procedure 0. The relative transformation between pairs of images is finally estimated as the most 
likely global transformation based on the remaining (inlier) keypoints. Specifically, if {xi}l^i and {a^il^^^i 
denote the positions of the matched inlier keypoints respectively in the first and second image, the registration 
is performed by solving the following minimization problem; 



where f{xi,ri) gives the position of the keypoint Xi after the transformation with r/. When T = SIM{2), the 
minimum can be found by solving a system of normal equations |17] . 

We compare now our registration approach to a baseline feature-based approach, where the features are 
built on the popular Scale Invariant Feature Transform (SIFT) [T^] and a RANSAC [5] method for rejecting 
outliers. We compare our approach to the SIFT-based solution for the estimation of large rotations. We consider 
the Duck image in Fig. [12] along with multiple transformed versions of this image obtained by rotation around 
the center of the image. Fig. [T7] illustrates the registration error versus the angle of rotation, for the SIFT- 
based approach and for our registration method. The registration error is measured on the original images with 
\\U(r])Ii — /2II2J where rj is the estimated transformation. It can be seen that the estimated transformation with 
the SIFT-based scheme becomes less accurate as the rotation angle increases. On the contrary, the performance 
of our method is independent of the magnitude of the transformation. This confirms that SIFT keypoints are 
not covariant to large rotations of the image. 
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Figure 17: Registration error vs. transformation angle using our SIFT and our approach. 



^We used the opensource implementation of SIFT available at jhttp: //www. vlf eat ■ org/- vedaldi/code/sift .html | for the ex- 
periments. 



20 



We finally look at the problem of handwritten digits registration with the baseline feature-based approach. 
We illustrate in Fig. [15] several examples of handwritten digits, together with the matched keypoints. One can 
see clearly that the matched keypoints are either inaccurate or insufficient to estimate a similarity transfor- 
mation, as we need at least two matches for such an estimation. Note that we consider in Fig. [18] the exact 
transformation of handwritten digits and that there is no innovation between a pair of images apart from the 
global geometric transformation. Therefore, in the more difficult case where we consider different handwritten 
styles, the SIFT-based approach clearly fails in estimating the correct transformation. For instance, the classifi- 
cation of handwritten digits using the baseline SIFT-based registration approach along with a nearest- neighbour 
classifier leads to a classification accuracy of only 46% in the same setting as above. 




(a) 1 match (b) 2 matches (c) matches 

Figure 18: Matched keypoints with SIFT features on handwritten digits. 



The above examples show that, in the cases where images are sparse in geometric dictionaries of the form 
of Eq. (|T|), the proposed registration approach might lead to better performance than baseline registration 
methods with standard visual features such as SIFT. 



5 Conclusions 

We have proposed in this paper a simple registration algorithm based on the sparse representation of the 
input images in a parametric dictionary of geometric functions. Our method is general in the sense that 
we can achieve invariance to any transformation group, provided that the geometric dictionary is properly 
constructed. We define novel properties of dictionaries, namely the robust linear independence (RLI) and 
transformation inconsistency in order to characterise the registration performance, which cannot be done with 
usual properties such as the coherence or the restricted isometry property. We show that our algorithm has 
low registration error when the RLI and the transformation inconsistency take small values. We also show 
that the proposed registration algorithm compares favorably with other baseline registration methods from 
the literature in illustrative alignment and classification experiments on simple visual objects and handwritten 
digits. To the best of our knowledge, this paper constitutes the first theoretically motivated work for image 
registration through sparse approximations in parametric dictionaries. We plan to extend our study to account 
also for the information conveyed by the coefficients of the sparse approximation, in order to further guide the 
registration process. Furthermore, we will use our theoretical findings in order to study the design of proper 
dictionaries that behave well with respect to the newly introduced properties. 
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A Proof of Theorem [T] 

We recall that rjo denotes the optimal transformation between p and q and that p and q are given by: 

K 

i=l 
K 

i=l 

We can write: 

da{p,q) -d{p,q) = min \\U{i])p-q\\.2 - \\U{T]o)p-q\\2 

= min \\U{r])p - U{7]o)p + U{?]o)p - qW^ - |jf7(%)p - 9II2 



< niiii J|C/(77)p- C/(7?o) 



mm 



(8) 
(9) 
(10) 



K 



K 



(11) 



Let be the indices of the most correlated atoms when the two decompositions are optimally aligned: 



= argmin \\<|)^oo■y^ - <Ps, L ' 

l<i.,j<K 

and let 77 be the transformation between the corresponding features: 

By definition, fj belongs to the set of feature-to-feature transformations 72'"'' ■ 



(12) 



If 



j2 = 0, then we have 



„ . Since we suppose that 7 i-7> U{'-f)4) is a bijective 



mapping, we have 770 o 7,* = Jj* and we finally get rjo — fj. Hence, we have in this case a registration error 
da{p,q) - d{p,q) = 0. 

We now focus on the case 



^voo-yi* ~ 4>Sj, II2 > 0. Thanks to Eq. (fTT|) . we have: 

K K 



da{p,q) - d{p,q) < 



1=1 i=l 
K 

i=l 

^ ^ |Ci||l'/'i'j07i — ?!'))o07i II 2 I 



(13) 
(14) 
(15) 



by using the triangle inequality. Since ||05^„ - 
da{p,q) - d{p,q) < 



1=1 

t'rioo'Yi, II2 > 0, we factorize the previous expression as follows: 

K ,1 , ,1 



17007- 
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1=1 
if 



= 110^,. - '^»70O7.* II2E I'^^l" 



(Prjo-ji ^fjoo-ji II 2 



i=l 
K 



\u{vo ^ o ?y)07,. - '?^7.- II2 
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"^'(007. 



,.||2E'°I' 
i=l 

'!>r;oo7,. lUllclll, 



(16) 

(17) 

(18) 
(19) 



where we have used in (*) the fact that U is unitary, p is the transformation inconsistency parameter introduced 
in Definition m 

We now focus on bounding \\4>Sj* — W^- In order to do so, we notice that (ps^, and (prjoo-ji, are 

respectively features in q and U{r]o)p. Since we assume that d{p,q) = \\U{r]o)p — q\\ < ev^k^ilf+lMlfj by 
using appropriately the robust linear independence property (Definition [IJ , we readily obtain an upper bound 
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on \\4)s., — iprioo-fit lb- Formally, let e be the vector of length 2 A' constructed from the concatenation of the 
coefficient vectors c and —d and define {xj}^=i as follows: 

Xi = Vo° li 
XK+i = (^i- 



Using this definition, we have U{r]o)p-q = J2i=i <^i4>vooji-J2i=i di4's, = J2i=i ^i'l^x, Iklb = vM 
Since d{p, q) < e\\e\\2 by hypothesis, and V is {2K, e, a)-RLI with a < v^, there exist i,j for which: 



< a, 



Irflll- 



(20) 



as the atoms in the dictionary are normalized. If both i and j are not larger than K , the above inequality 
implies that: 



(a) 



l^ria°li II 2 



(6) ^ 



(21) 



where (a) is obtained thanks to the positivity of c and (b) is a consequence of the positivity of the atoms. Since 
we assume that a < ^/2, Eq. (PH)) and Eq. (j2ip cannot hold together. Hence, we exclude the case where i < K 
and j < K. For the exact same reasons, it is easy to see that we cannot have i > K + 1 and j > K + 1. 
Therefore, the only possibility \s i < K and j > A' + 1 (or j < K and j > A' + 1, which is identical, up to the 
relabeling of i and j). Thus, by rewriting Eq. (ppj) we get: 



%-k\\2 



< a, 



thanks to the positivity of c and d. Since i* and j* are by definition chosen to minimize the error between 
two features in U{i]a)p and q (Eq. (fT^ ) we have: ||(/)i)o07i* — <l)Sj* jjj ^ ||0j?oO7i ~ 't'S^^ 
inequality into Eq. (|T9|) . we get: 



da{p,q) - d{p,q) < ap|lc||i. 
It is not hard to see that da(j>, q) — da{q,p) and d{p, q) ~ d(q,p). Hence, we get: 

da{p,q) - d{p,q) < ap\\d\\i. 
By combining Eq. (22) and Eq. ([221), we conclude that: 

da{p,q) - d{p,q) < apmin(||c||i, ||d||i) . 



j^K\\2 - Plugging this 
(22) 



(23) 
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B Detailed study of the case where 7 1— > U{'j)(j) is not bijective 



We study in this appendix the case where 7 1— > U{'y)(j) is not a one-to-one mapping. In other words, we assume 
here that the generating function cj) has symmetries in T. More precisely, let be defined by: 

5^ = {7 e r : U{j)4' = 0}. 

In group theory, is known as the stabilizer of (j) in T. Note that is a subgroup of T. Moreover, it is easy to 
see that the stabilizer of any atom (f)s can be obtained from 5*0 with S^^ ~ S o Sij,o S^^ = {S o tt o : tt G S^}. 
Hence, given any S £ T-, the set of elements 7 in T that satisfy (jjs = 4>-y is equal to S o S^. 

When r/ 1— > U{'j)(j) is a bijective mapping, is equal to the trivial group. When T — SE{2) and (j) 
is an ellipse-shaped generating function (Fig. [2|), the stabilizer contains two elements, namely the identity 
transformation and the rotation of angle tt. Note that when (j) is exactly circular, </> is symmetric with respect 
to all rotations; we get = 50(2). 

In general, we avoid choosing a generating function whose stabilizer in T is an infinite subgroup, since 
the mother function should be discriminative enough for different transformations if we hope to recover the 
underlying transformation in T ■ Our goal in this section is to show the modifications we need to perform in 
order to extend the assumption \S,p\ = 1 to \S(j,\ < 00, that is we need to assume that a limited number of 
symmetries exist in atom transformations. 



B.l Modified algorithm 

The main challenge of having \S^\ > 1 is that several features can have the exact same appearance although they 
correspond to different transformations of the mother function. Clearly, arbitrarily choosing the transformation 
results generally in a wrong registration. The only way of solving this problem exactly is to examine all 
transformations that potentially generate a feature and test accordingly all feature-to-feature transformations. 
Formally, let (j)-^ and (ps be respectively arbitrary features in p and q. As we mentioned earlier, the set of 
parameters that generate features having the same appearance as 0^ is 7 o 5^. The same result holds for (jjs- 
Hence, the set of transformations that map features of appearance 0^ to features of appearance is given by: 

{S o n o {tt')^^ o 7^-*- : TT, tt' G S^} = {S o n o 7^-'- : tt G S^f,}. 

We thus extend the set of feature-to-feature transformations TJ^''^ to: 

7J^? = {S, o ^ o 7-1 : 1 < J < K, TT e 54. 

Note that the only difference with respect to the set T^''' defined in Section [5] is that we compose in the middle 
of the expression with all transformations in the stabilizer group of (j). Hence, the cardinality of 72''* is equal 
to liS^jif^. The rest of the algorithm (Algorithm [1]) remains unchanged. 



B.2 Modified analysis 

We now turn to the analysis of the modified algorithm. First, it can be shown that in the case where images 
can be perfectly aligned. Proposition [5] holds for the modified algorithm when jiS^j < 00, when there is a finite 
number of symmetries. 

We then extend the analysis of the modified algorithm to the case where images cannot be perfectly aligned, 
but where the innovation is limited by d{p,q) < eVll'^lli + IMIli- The main difficulty of the analysis lies in the 
fact that we have the transformation inconsistency p (as defined in Definition H]) equal to infinity when the 
mother function is symmetric (we can see this for example by considering the same setting as in Example [2] 
illustrated in Fig [5] with 77 a rotation of tt). We take into account the symmetries of the generating function in 
the following new definition of p: 

||C/(r/o7ro (?7')-i)0^ ~ 0^11 
p=sup sup mf sup — , (24) 

ver rj'en ""^^^leT^ ||C/(?y)0 - [/(?7')'/'ll2 

where "q o denotes the set {^y o 7,7 € S^}. Note that by constraining 77' to be outside the set 77 o 5^, the 
denominator of the above equation is never equal to zero. Therefore, this new definition of the transformation 
inconsistency solves the problem that we have observed in Examples [2] and [3] for the particular case of generating 
functions having a symmetry of tt in T = SE{2). 

Note also, when iS^ is the trivial group, the above definition of p reduces to Definition |4l since it is easy to 
check that 

||f/(77 7ro(77')-i)</'^-</)^||2 \\U{rio{i)-^)(l,^-(t^^\\ {Wi^l^y ^MW 

— "-^ < sup sup ^ 
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and the reverse inequality also holds. Hence, this definition can be seen as an extension to the case where 
the generating function has intrinsic symmetries in T. Intuitively, the transformation inconsistency p is small 
whenever two transformations r/ and rj' applied on the generating function that yield similar atoms in appearance 
will be such that 77 o tt o [r]')^^ does not induce a large change in the appearance of any atom in the dictionary 
V, for some tt G 50. 

Using this new definition of p, we obtain the same bound of Theorem [T] for the modified algorithm. In the 
following, we give the main differences in the proof of this statement with respect to the proof of Theorem [1] 
given in Appendix \K\ 



Proof. Let {i*,j*) be the indices defined in Eq. p^ . and let 77 = Sj- o7ro7.,^, for any tt € S^. Clearly, we have 

In the case where ||(/>rjo07j. ~ 0(5^.11 2 = 0, there exists tt G 5^ such that 770 o 7i* = Sj* o tt, thus 770 = 
Sj. o n o g Xi '^- Hence, in this case da{p, q) = d{p, q). 

We consider now the case where ||0r;oO7i* — (f>Sj, II2 > 0. By using the same series of inequalities as in Eq. 
dHl)- (113, we know that: 



da{p, q) - d{p, q) < \\<j>s., ~ ^,,007.. 



K _ 1 



i=l 



Us,. -</',,o07.. II2 



^ II , , II (\ U{rio O f? -^7 - 07 2 1 I 
< \m,, - 0r,oO7,. II2 sup <^ ——^ \ > \Ci 



i=l 



Since this inequality is valid for any 77 of the form Sj* o n o 7-,^ where tt G 5^, we deduce from the previous 
inequality that: 

N ^/ ^ ^ iiw, J. II • P l \\UiVo^°^j' O7ro7,:;^)07-07||2 I n „ 
da{p,q) - d{p,q) < 05 - 0^007.. 2 mf sup <^ ^^—luT 2 — n ( 1 



[/(r/ o TT o (77') ^ 
'ver VeTrf ^es^7efdl \\U{v)(l) - U{t]')(I)\\ 



< 1105.. - 0,,oO7,. II2 sup sup mf sup <{ 11^^^^^^ ,^^|| )■ \\c\\i 

= Il0d-,. - 0r,oO7,. |l2P||c||l 



By using the same upper bound on \\(f)s^, — 4'voo-fi* II2 in th^ exact same way as in Appendix El (thanks to the 
RLI property), we obtain the desired result. □ 
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C Gradient descent refinement 



We describe in this appendix the local optimization technique that we use to refine the estimation of the 
transformation obtained with our registration algorithm. Specifically, we present here briefly our gradient 
descent approach that respects the intrinsic geometry of our registration problem. In order to do so, we first 
define an appropriate distance in T. Then, we formulate the induction of the gradient descent on 7", where we 
follow an approach similar to the work by Jacques et. al. in [TU] 



The most direct distance in T is the mere Euclidean distance: y X]i=i(7i ~ ^2)^ where 7} and 72 denote 
respectively the components of 71 e T and 72 G T. However, this distance is artificial since it mixes several 
components that are different in nature (translation, rotation and scale components for example). Thus, we use 
instead a distance that is naturally introduced by the continuous dictionary Vc — {?7(7)0 : 7 e T} C L^. That 
is, rather than considering the distance directly between the parameters, we consider the distance between the 
atoms generated by these parameters. Hence, we first introduce a distance in the signal space, and translate 
naturally this distance to the parameter space. 



The space Vc is a continuous submanifold of [5]. We let (7(71,72) be the geodesic distance between 



and (j)j2 ^c- It corresponds to the shortest path in Vc between and , where (f) is the generating function 
of the dictionary. Formally, we have: 

5(71, 72) = inf {L{(j)z) : all curves z : [0, 1] — ^ T satisfying z(0) = 71 and z{l) = 72 } , 

where L is the length of the curve 4>z : 



f 




'0 


dt 



dt. 



(25) 
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We use the chain rule to expand the previous expression: 

d(t)z(t) ^ 

i=\ 



dt 



Y^z\t)d, 



where i*(i) denotes the i-th component of 4f (i) and d., 



'i<Pz(t), 



By injecting in Eq. we get: 



p p 
t=i j=i 



The previous equation introduces a natural notion of metric in the parameter space, that is, a way to 
calculate the scalar product between two elements in a tangent space of T. In order to see this, let Gj be a 



matrix of dimension P x P defined as follows: G-y = ((9^0^, 9^0^)) 



<i,j<P' 



for any 7 G T. Given two elements 



and x living in the tangent space of T at a point 7 , we define the metric as follows: 



(26) 



This metric is chosen in such a way that the geodesic distance in Vc coincides with the geodesic distance in 
T. The matrix Gj is refered to as the Riemannian metric associated to the manifold T. We assume that this 
matrix is positive definite in the rest of this section. 

Endowed with the above metric, starting from a point tq e T, the gradient descent induction is given as 
follows: 



wVJ(t,) for i > 0, 



where 



VJ(t,) = 



dp J in) 



(27) 



and w defines the step size. On a practical level, the step size w is chosen using a line search at each iteration. 
We limit the overall number of iterations in order to control the computational complexity of the algorithm. 

One can check that the above definition of the gradient VJ(ri) is natural, since the gradient is defined with 
the following equality. 



(VJ(r),e), = dJ.(0, 
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for any r S T and ^ belongs to the tangent space at r, and dJr{£,) gives the directional derivative of J in the 
direction of ^ evaluated at r. We can expand dJriO follows: 



■aiJ(T)- 
dpJ{T) 



Besides, by using the scalar production definition of Eq. (pS)) . we have: 



(28) 



(29) 



By combining Eq. ([^5]) and Eq. we obtain the definition stated in Eq. (P?]) . 

In order to illustrate the benefits of this local gradient-based optimization step, we conduct an experiment 
where we compare the accuracy of the estimated transformation using our approach with and without gradient 
descent. Specifically, we generate 100 random transformations of the Duck image in Fig. [T^ and register the 
original image with the transformed images using both methods. The translation, rotation and scale errors are 



measured respectively with 1 1 — &o 1 1 2 , 



a- an 



and min(|6' — Sgl, 180— 10 — 6*01), where the optimal transformation 



is denoted by 779 = (fep, 00,6*0) and the estimated transformation is equal to 77 
errors in translation, rotation and scale parameters. 



{b, a, 0). Tabic [3] gives the mean 





Without GD 


With GD 


Translation error 


2.67 


0.72 


Scale error 


0.11 


0.02 


Rotation errror 


7.7° 


3.66° 



Table 3: Mean value of translation, scale and rotation error over 100 random trials, 
performed on the 'Duck' image (Fig [T^ . The sparsity value K is set to 15. 



All the experiments are 



We observe in practice that the overall performance of our algorithm increases substantially when gradient 
descent is used to refine the estimation of our registration algorithm. 
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D Proof of Example [T] 

Let a be an arbitrary real vector of K eleraents, and let ti, . . . , be any real numbers such that ri < • • • < 



tk ■ Let e be a sufficiently small real number that satisfies < e < ^ / prry ■ suppose that a satisfies 



Sill OjWri < elklb- Our aim is to prove that there exist two box functions and v-rj that satisfy: 



aiVr 



T- ^Ti 1 1 2 



< a, 



with a = ey^|(4^' - 1). 

We assume without loss of generality that ||a|j2 = 1- We first show the following result, that establishes a 
lower bound on one of the components of the coefficient vector a: 



Lemma 1. There exists i £ {1, . . . , K} such that |ai| > 2* , with Y 
Proof. We prove this lemma by contradiction. We have: 



3 

4^ -1 ■ 



K 

Ml = \a^ 
1=1 

K-1 
i=0 



2i 



4^-1 

K-l 



4A'_ 

4=0 

3 4^-1 
4^ - 1 3 
1, 



which contradicts the fact that 11 all 2 = 1. 



□ 



We let i* be the smallest integer that satisfies \ai\ > 2* ^Y. The following lemma shows that there exists 



is larger than Y . 



necessarily an interval where the function 
Lemma 2. 

1. There exists an index j satisfying Ti* < tj < t^* + 1 such that ajai* < 0. 

2. Let j* be the smallest integer that verifies aj*ai* < 0. For all t G [r^. ,Tj-), X^iLi '^i^n (^) 
Proof. 



> Y. 



1. We prove the first statement by contradiction. Suppose that either all box functions between t^* and 
Ti- + 1 are associated with coefficients that have the same sign as a;* , or no box functions exist between 
Ti- and Tit +1. Let jo be the largest index such that r^. < Tj^ < t;* +1. We have: 



K 


2 

r + oc 


K 


2 






aiVn (t) 


dt 


i=l 


2 -^^1 


1=1 





> 



r,.+l 



T,.+l 



T,.+l 



K 



dt 



i=l 
jo 

a^Wri (t) 

i=l 

jo 



dt 



i*-l 
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By using the triangle inequality, we have for any t G [r^. , r^. + 1]: 



Jo 



> 
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i—i* 



The last inequality derives from the fact that the coefficients have all the same sign for i £ {i* , . . . , jo}. 
As i* is by definition the smallest integer which satisfies \ai*\ > 2* ~^Y, we have \ai\ < 2*~^F for all 
i G {1, . . . ,i* — 1}. Hence: 



i'-l 



Thus, \a*\ - J2t^i^ Wil > 2''-^Y - Y{2''-^ ~1)>Y. Finally, we have: 



i=l 2=1 



Jo 



Ti.+l 



> / |ai. I - E \^Mt dt > y^ 



which leads to a contradiction since e <Y 
2. Let t e [ri-.,rj.). Then, we have: 

K 





r-1 


K 




E OiWr, (<) + E 

2=1 2=1* 










E aiVr,{t) 
i=l 










> 


E aiVr,{t) 
i—i* 


- E 

i=l 








> 


0^-1 - E 1° 





The last inequality is obtained due to the fact that Tj- < t^. +1 (hence Vr-, {t) = 1) and that the coefficients 
Qi have the same sign for all i £ {i*, . . . , j* — 1}. As i* is by definition the smallest integer that satisfies 



> 2' -ly, we have |a,| < 2'-^Y for all ie {1,. 



1}. Hence: 



i*-l 



i*-l 



J2W^\<yYt-' = y{t'-'-i). 



Thus, |a*| - Eili^ loil > 2*'-iy - Y{2''-^ -1)>Y, which concludes the proof of the lemma. 



□ 



We now prove that two box functions have necessarily to be close to each other since the function 
is large enough in the interval [Ti»,Tj*) (and at the same time X^ili Q^i'^^'n < We have: 





K 


2 


K 


2 








E ajWr. (i) 






i=l 


2 -^"1 


i=l 





if 


2 


K 


2 

/•oo 


K 


2 


E (*) 


dt+ 1 


E (*) 


dt + 


E a^^^Ti (0 




i=l 




1=1 




i=l 





> 



dt 



>ir,'-rr-i)Y' 



(30) 
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thanks to Lemma [H Wc thus get: 

Moreover, the relation between Tj* — Tj»_i and (wtj. i'^Tjt_i} can be obtained easily: 



otherwise 



As e < F, we have |tj- — Tj._i| < 1. Hence, 
Moreover, as aj*-iaj<- < by construction, we have: 



2(1-K.,^^r,._,))< 



which concludes the proof. 
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